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The two-stage drop-the-loser design provides a framework for selecting the most promising of K 
experimental treatments in stage one, in order to test it against a control in a confirmatory analysis at 
stage two. The multistage drop-the-losers design is both a natural extension of the original two-stage 
design, and a special case of the more general framework of Stallard & Friede (2008) (Stat. Med. 
21, 6209-6227). It may be a useful strategy if deselecting all but the best performing treatment after 
one interim analysis is thought to pose an unacceptable risk of dropping the truly best treatment. 
However, estimation has yet to be considered for this design. Building on the work of Cohen & 
Sackrowitz (1989) (Stat. Proh. Lett. 8, 273-278), we derive unbiased and near- unbiased estimates in 
the multistage setting. Complications caused by the multistage selection process are shown to hinder 
a simple identification of the multistage uniform minimum variance conditionally unbiased estimate 
(UMVCUE); two separate but related estimators are therefore proposed, each containing some of the 
UMVCUEs theoretical characteristics. For a specific example of a three-stage drop-the-losers trial, we 
compare their performance against several alternative estimators in terms of bias, mean squared error, 
confidence interval width and coverage. 

Keywords: Bias-adjusted estimation; Drop-the-losers design; Treatment selection; 
UMVCUE. 
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1 Introduction 

The maximum likelihood estimate (MLE) of the treatment effect is often reported as standard at the 
end of a multistage trial. It is of course a precise and readily available estimator, but since it ignores the 
trial's sequential nature it is generally biased (Whitehead, 1986) and considerable research has been 
conducted into estimation methods that address this fact. Although many bias adjusted estimation 
procedures have been proposed, and unbiasedness is certainly not the only characteristic by which 
an estimator can be judged, the only way to achieve an efficient and "purely" unbiased estimate is 
to execute the following procedure: (i) identify an unbiased estimate based on part of the data — Y 
say, (ii) identify complete, sufficient statistics for the parameter in question — Z say, and (iii) employ 
the Rao-Blackwell improvement formula to obtain E[Y\Z] — the uniform minimum variance unbiased 
estimate (UMVUE). 

There are two distinct applications of the Rao-Blackwell approach to estimation in multi-stage trials. 
The first approach, as pioneered by Emerson & Fleming (1990) and further clarified by Liu & Hall 
(1999), applies to a group sequential trial with one active treatment and one control arm that stops 
when conclusive evidence (for or against the efficacy of the treatment) is first observed. The stage at 

* Corresponding author: e-mail: jack.bowden@mrc-bsu.cam.ac.uk 



© 2014 The Author. Biometrical Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, 
provided the original work is properly cited. 



Biometrical Journal 56 (2014) 2 



333 



which the trial stops — M say — is a random variable. Given M, a sufficient statistic of the data, Z, 
and a truncation adaptive stopping rule (Liu & Hall, 1999) one calculates the expectation of the first 
stage data, Yy say, given the pair (M, Z) to obtain the truncation adaptable UMVUE. We refer to 
this approach as unconditional because, in a three-stage trial, for example, it produces an estimate of 
the treatment effect regardless of whether the trial stops at stage one, two, or three and it is therefore 
unbiased by definition when one averages across all possible realizations of the sequential trial. 

However, in some circumstances one may feel it more appropriate to develop estimators that are 
UMVUE conditional on the occurrence of a particular subset of trial realizations. For example, in 
two-stage "drop-the-losers" trials, the best performing of K experimental treatments is selected after 
the first stage before being tested in isolation against a control group in a confirmatory analysis in stage 
two. The stage two estimate of the selected treatment, 7, say, is unbiased, and Cohen & Sackrowitz 
(1989) obtain the UMVUE of the selected treatment, conditional on the order of the stage one 
treatment arm estimates, by calculating £[72 |Z, g], Q denoting the stage 1 order statistic condition. 
We call this an example of an UMVCUE — C for conditional. UMVCUEs have also been proposed for 
use more generally in two-stage trials evaluating a single treatment that allow early stopping for futihty 
only. This is because a strong argument can be made that estimation of the treatment's effect is only 
important when the trial does in fact continue to the final stage; see Pepe et al. (2009), Koopmeiners 
et al. (2012), and Kimani et al. (2013) for recent examples. 

The two-stage drop-the-losers' design has been the focus of much attention in the research literature. 
Sampson & Sill (2005) and Wu et al. (2010) consider hypothesis testing methodology for this design 
whereas Sill & Sampson (2007), Bowden & Glimm (2008), and Bebu et al. (20 1 0) target point estimation. 
More generally, the vast majority of adaptive trial designs have also followed a two-stage strategy in 
the following sense: Design adaptations (such as subgroup selection or sample size adjustment) are 
made at the first interim analysis. The trial then proceeds under a fixed design (possibly with additional 
interim analyses) for its remaining duration. However, it is questionable whether a two-stage approach 
is always the best strategy. As an example, an adaptive trial was recently conducted into a treatment for 
maintaining lung function in patients with Coronary Obstructive Pulmonary disease (COPD) (Barnes 
et al., 2010). The trial aimed to use its first stage data to select the most promising doses of a new drug, 
before testing them against a placebo in a confirmatory analysis at stage two. In the event, two out of 
the four doses were selected for continuation to the second stage. 

By selecting two doses (instead of one), the study decreased its chances of accidentally discarding 
a dose that would ultimately be successful. Nonetheless, evaluating two experimental treatments in 
the confirmatory analysis is more challenging than for one, since multiple testing corrections must 
be applied. Furthermore, if the best performing of the two experimental treatments ends up being 
recommended, then its estimate may subsequently be queried as "biased". There may, therefore, have 
been an advantage in allowing selection of the single most promising treatment-dose to occur after 
several interim analyses. With this in mind, a clear multistage analog for the two-stage drop-the-losers 
design exists: rather than selecting the best of K treatments at a single interim analysis, selection could 
be achieved by dropping a predetermined number of treatments at each stage due to (relative) poor 
performance until only one remained. The downside of inserting additional interim analyses into any 
clinical trial is clearly an increased administrative burden. Yet, in the context of a drop-the-loser trial, 
doing so can markedly increase the probability of selecting the truly best treatment for a given number 
of patients, as is shown in Section 2. 

The multistage drop-the-loser design approach is actually special case of a more general design 
framework for testing multiple treatments proposed by Stallard & Friede (2008). In their paper, the 
decision to drop a treatment need not be dictated by a predetermined rule based on efficacy data but, 
if it is, the family wise error rate of the trial can be controlled in the strong sense. Stallard & Friede 
(2008) do not touch on the issue of estimation for their general design, although it is highlighted by 
them as an area for future methods development. In this paper we focus on estimation, and attempt to 
derive the UMVCUE for the specific drop-the-losers case. A different derivation to that of Cohen & 
Sackrowitz (1989) is used. It is perhaps less intuitive than the original, but generalizes to an arbitrary 
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number of stages far more easily. Furthermore, allowing additional stages of selection requires an 
increasingly strong and unexpected form of conditioning to be employed. So in order to best elucidate 
the approach we start in Section 2 by considering the extension to the three-stage case and the general 
/-stage formula is left as an appendix. In Section 4, we apply our estimation proposals to some specific 
three-stage trial examples, and compare its performance against several other estimation strategies. 
Interval estimation for the selected treatment is also considered. We discuss the issues raised and point 
to future research in Section 5. 

2 The three-stage drop-losers design 

Imagine a three-stage trial initially involving K experimental treatments and a control group. The 
purpose of the first two stages is to identify which treatment has the most beneficial true effect, as 
reflected via an appropriate outcome measure. Throughout this paper we will assume that higher 
values of the outcome are more desirable. At the end of stage one, K — L experimental treatments are 
dropped and then the best of the L remaining experimental treatments is selected at the end of stage 
two for confirmatory testing against the control group in stage three. We will refer to such a three stage 
design as a ''K.L: 1 " trial. Let k e {Q, .. . ,K] index the initial full set of treatments with k — Q referring 
to the control group. Let X/,, denote the response of the /th patient on treatment k, which is normally 
distributed with mean /z^, and variance v^.. At stage j (J — 1- 2, 3), n subjects are randomized to each 
treatment arm still active in the trial and the experimental treatments are evaluated according to a test 
statistic of the form: 



Without loss of generality we will refer to the true mean of the selected treatment as the reasons 
for this subscript notation are given in due course. At the conclusion of the trial it would be natural to 
primarily focus on testing the null hypothesis i/g : /U,i — < 0. In this paper, we focus on the task of 
estimating the contrast /Xj — /Xg- In the next subsection, we show empirically that whilst K.LA trials 
can provide more power to select the truly best treatment for confirmatory testing (compared to an 
analogous two-stage drop-the-losers trial) the MLE for /Xj can be substantially biased. Section 2.1 
serves to merely illustrate the problem. Further details regarding the notation, as well as the design 
and analysis of drop-the-losers trials, are covered from Section 2.2 onwards. 

2.1 Initial motivation 

Trial data are simulated under a 3:2:1 design. Each treatment arm at each stage is allotted « = 60 
patients and the variance of each patient's individual treatment effect, vf, is 50. At the end of stage 2, 
300 patients have been allocated to the experimental treatments. This is contrasted with a traditional 
two-stage "3:1" trial, where the same number of patients (100 per arm) are used to select the best 
performing treatment after stage one. We assume that the vector of true mean effects for the three 
treatments is (1, A, 2). Figure 1 shows the proportion of simulations for which the truly best treatment 
(with mean equal to 2) is selected, as A is varied between 0 and 2. Each point is the average of 50,000 
simulations. The 3:2:1 design gives a marginally higher power of selecting the best treatment than the 
3:1 design. The results of two further simulation scenarios are also shown in Fig. 1 (left), namely: 

• A 5:3:1 trial (75 per arm per stage) compared with a 5:1 trial (120 per arm). True vector of 
treatment means (1, 1, 1, A, 2). 600 patients used to select best treatment in total. 

• A 6:2:1 trial (75 per arm per stage) compared with a 6:1 trial (100 per arm). True vector of 
treatment means (1,1,1,1, A, 2). 600 patients used to select best treatment in total. 




(1) 
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Figure 1 Left: power to select the truly best treatment for various two and three-stage drop-the-losers 
designs as a function of A . Right: bias and coverage of the MLE in a K:2: 1 design, as a function of K. 



For these scenarios, the difference in power between the two and three-stage designs is now much 
more pronounced, and the case for switching to a three-stage design is stronger. Increasing the sample 
size of the trial (i.e, reducing the variance of the estimates) will always increase the probability of 
selecting the best treatment (assuming that one is truly better than the others). In our example the 
treatment with mean effect 2 is always best. For increasing sample sizes, the power curves approach 1 . 

Note that, if equal numbers of patients were randomized to the control group as are to the ex- 
perimental arms at each stage, the two and three-stage designs featured above would have different 
numbers of controls at different stages. However, since we have assumed that the trials always continue 
to the final stage, the control group data only impacts on the final stage analysis; it therefore does not 
affect the probability of selecting the best treatment. 

Trial data is now simulated under a K:2:\ design for = 3, . . . , 8 and with « = v^. = 50 for all 
k. All treatments are assumed to have no (zero) effect. Figure 1 (right) plots the bias and coverage 
of the MLE for /Hj as a function of K. One can see that as the number of experimental treatments 
increases, the bias in the MLE increases and the coverage of its 95% confidence interval starts to fall 
well below its nominal level. In the context of a two-stage drop-the-losers trial, the bias in the MLE of 
the selected treatment is maximized when all treatments have the same effect (see Carreras & Brannath, 
2013, for a proof)- We conjecture that this is true under three-stage drop-the-losers selection as well. 
Despite the fact that Fig. 1 (right) most probably represents a worst case scenario in terms of the MLEs 
performance, there is certainly room for alternative estimation strategies to be developed. 



2.2 Notation for the estimation problem 

For simplicity we will now assume that the within treatment variance term v^. is constant across 
treatments, and is hence referred to as v^. This means that ranking the treatments via the test statistic 
in Eq. (1) over the first two stages is equivalent to ranking by the values of the experimental treatment 
terms only. That is, the cumulative control group data at stage j, ^^'^ the common square 

root term ^ jnjlv- can be ignored, leaving the experimental treatment MLEs to be directly compared 
head-to-head. 



© 2014 The Author. Biometrical Journal published by WILEY- VCH Verlag GmbH & Co. KGaA, Weinheim 



www.biometrical-journal.com 



336 



J. Bowden and E. Glimm: Multistage UMVCUE 



Stage 1 data 
Stage 2 data 

Stage 3 data 



Stage 2 ordering i ; Stage 1 ordering 



Y Y Y 

-' 11 21. ••• LI 


Y Y Y 

1 12 J 22 ••• L2 






Y 

13 


4 i 

? . . . L 


1 ; 



: .7 7 



L+1 . . . K 

Figure 2 Schematic diagram showing the selection process of the three stage trial. 



Let Yi.j (k — I, . . . , K, J — 1, 2, 3) represent the estimate for the mean effect of treatment k using 
only those subjects recruited at stage j. In order to add more flexibility, let the originally fixed number 
of subjects per-arm per-stage, n, now equal «y, so this number can vary across stages if required (as in 

Bowden & Glimm (2008)). Letting n* = Y.Li r = ^j' ^l^^ii- 



E 



,+1 



We base all subsequent mathematical derivation on the Y/^j random variables (or transformations 
of them), leaving the individual patient data X/.^ notation behind. Let ^ represent the vector 
(i/fi, . . . ,Vfjf) where i/^^, is the ranking of treatment k using our design framework. Denote the event: 
ij/^ = I, \j/2 — 2, -ij/f, — K hy the letter Q. It is useful to label the Yi.jS so that event Q is satisfied. 
That is, so that Y/^j refers to the treatment that is ultimately ranked as kth best at the end of the trial. 
However, it is important not to confuse or equate this convenient labeling with explicitly conditioning 
on event Q — this is done in Sections 3 and 3.1. 
At the end of stage 1, the top L treatments are kept and the remainder are dropped. So, 



'i+1,1' 



(which are identical to the stage one MLEs) satisfy: 
for k^L+l,...,K-l. 



> Y,, 



A-+l,l' 



(2) 



This enables Y,^.^ and /x^. to be associated with treatments k — L + I, . . . , K after stage one. At stage 
two, the remaining L treatments are ranked according to their cumulative MLEs. So, assume that the 
remaining stage one statistics (Fn, . . . , F^i) and (Yyj, ■ ■ , Yj^j) satisfy: 



k\ 



'i Hi 



'2 ^k+\,\ 



■^\Yk+\,2 



for A:= 1,...,L- 1. 



(3) 



This enables {Yi^y, Yf-j) and fX/. to be associated with treatments k — 1, . . . , L at stage 2. A schematic 
diagram of this selection process is shown in Fig. 2. Ultimately, the selected experimental treatment 
is therefore associated with stage-wise statistics (Fu, 7,3) and additionally a Yy^ ~ N(/X[,cr^), the sole 
experimental treatment tested at stage 3. Note that although (7,i, Y^j) fulfills the inequalities in 
Eqs. (2) and (3), this does not directly imply that Y^^ > Fji or > Fjj. Note further that al- 
though the truly best treatment has the highest chance of being selected, 11 , is not equivalent to 
max{/Lt/^, k — 1, . . . , .^}. 

At the third and final stage of the trial, we seek an efficient unbiased estimate of — /liq, where 
/Xq represents the mean parameter of the control group. As previously mentioned, the control group 
always progresses to the final stage of the trial, so /Xg can be trivially and unbiasedly estimated using 
all of the relevant data via its MLE. By contrast, the parameter /Xj is much more elusive; at the trial 
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outset it is a discrete random variable with K possible values and, conditional on Q, it is not unbiasedly 
estimated by its final stage three MLE: 

since 7,1 and are conditionally biased with respect to /Xj. We therefore focus on bias-adjusted 
estimation of /Ltj for the remainder of the paper. 



3 Unbiased and bias-adjusted estimation of ii^ 

Let = (F2i> ■ ■ • - ^A'l) ^iid = (Y22, ■ ■ ■ , Yj^2) represent the vector of mutually independent un- 
selected treatment estimates at stages one and two. Further, let Y, and Y, represent the complete set 
of normally distributed statistics at stage one and two so that, for example, Yj = (Fn, Y^). The joint 
distribution, / (.), of the complete data Y^^, Yj, Y^ 




Yn - Ml 



B 



■^1 



Ml 



n 



( Yu-Yn\ 













{ol + ol)Y,2 



CTf + a^B 



n 



(5) 



where A 



B. 



Y\, Y\ and Zy 



1"2"3' 

a, a. 



2 2 

g{g{ ■ 



- Gjo}^ and n equals all terms involving (Yj, Yj). The trio 

Fi J are therefore complete sufficient statistics for all mean 

parameters unconditionally. We note that the conditional density /(F,3, Yj, Yj |Q) is essentially the 
same as Eq. (5), except that its support is restricted by Q. It must therefore be scaled up by a factor 
representing the probability of g, in order to integrate to one over the restricted space. 

In the spirit of Cohen & Sackrowitz (1989) who investigated two-stage UMVCUEs, a three-stage 
UMVCUE for yLi[ would be a Rao-Blackwellization of the unbiased final stage estimate conditional 
on the complete sufficient statistic (Yp Y!;,Zj) and under Q. We now clarify precisely how is 
restricted in this setting. At stage one, we can state from Eq. (2) that (a): F,i > F^+i 1 and from 
Eq. (3) (by removing a common term -^^ttW) that: 



G^ G 



■22 



The set of F13 values that satisfy conditions (a) and (b) is the sampling distribution of F13 1 Yj 
From the definition of Zj , condition (b) implies that: 



Y^,Zi, 



Q. 



13 — ^1 

G\G2 ^\^2 

and condition (a) implies that: 



Y 



1^12 < 



'1"2 



1^21 



'22 



1^13 < 



'1"2 



'1"2 



■Fp 
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Therefore, I^nlYj, YiJ, Z[, Q must be less than 



mm 



^1- 



'1"2 



-Ay 



(J-, a-, 
L y L 

2 12' 
Cr, CTl CT 



'22 



1"2 



Since T depends explicitly on the value of y^ (as well as implicitly through Z[) it is natural to condition 
on (Y'p Y2, Zi) rather than on (Y^, Y^, Z^) in order to calculate E{Y^^\\\, \,, Zy, Q]. We will refer to 
this as the "RBI" estimator, and denote it by the symbol /if^'- 

Lehmann & Scheffe (1950) have shown that a sufficient and complete statistic is also minimally 
sufficient (the converse is not true). Therefore, since (Yj, YjjZj) is sufficient but not minimal, it 
cannot be complete. Thus, the RBI estimator will be unbiased and have a smaller variance than Y^^ by 
the Rao-Blackwell theorem, but it is not UMVCU. We now calculate /if*' , returning to a discussion 
of what properties can and can not be claimed by it in Section 3.3. 

3.1 Deriving the RBI estimator 

We will derive the RBI estimator in the following manner. We start by transforming: 

/(yi3, Y,, Y,) ^ /(y,3,z,, Yi, Y2) ^ /(yi3|z,,Y'i, Y2), 

then show that y|3 is independent of (Yj , Yj, n-) given (Zj, Y^j), and finish by going from: 

/(yi3lZi,yi2)^/( 

^13 1-^1' ^12' 

Q)^ E(Y,,\Z,,Y,„Q). 

Let the vector fi and diagonal matrix E be the mean and variance of the K + L + I variables 
(yi3, Yj, Y[)' written as (yi3, Y^^, yjj, Yj, Yj)'. They follow the ordering described in Section 2.2 
and Fig. 2 so that, if Yjj is the /th variable, the l-ih entry of fi is ix^ and the (/, /)-th element of S is 
aj. Using a standard result (e.g., Srivastava, 2002, Theorem 2.5.1), (yi3, Zj, Y^2, Yj, Yj)' follows the 
(K + L + l)-dimensional multivariate normal (MVN) distribution: MVNf^_^^^_^_l (Cfi, CEC'), where 



/ Ml \ 

f/^i 
Ml 





A 


0 


0 


A 




A 


0 


0 


A 


-I 


0 


0 


0 


0 


a*2 






0 


0 



0 



0 \ 



0 



For convenience we have labeled the mean and variance parameters of Yj , Y% using the generic symbols 
II* ,a*^, not because they are all equal, but because they are equally irrelevant to subsequent develop- 
ment. Next, we apply another well known theorem on conditional multivariate normal distributions 
(e.g., Srivastava, 2002, Theorem 2.5.5). Let 



CEC = E* = 



■^22. 
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where S*i= ct^ and is equal to the 1 x (K + L) vector 0, . . . , 0) = Tjy. The inverse of the 
remaining (K + L) x {K + L) submatrix, E^j, equals 



(al/D 
-A/D 
0 



V 0 



-A/D 
B-/D 
0 



0 
0 

0 



0 \ 



0 

r*-2 



where D — B-a^ — Defining a — (Zj, Yj, Yj), and ii^ — (^/Xp Mi> M*. ■ ■ ■ > f^*), we can write the 
conditional distribution /(7,3|Zi, Yj, Yj) as iV(/2, ct^) where: 



= Ml 
~ 5 



Aa^- 
~D~ 



~D 



{alZ,-AY,^), 



(I"p-Mi) 



and CT^ = — EijEj^'Xji = ^1 The conditional density of given Zj and does not 

depend on Y\, Y^ or /x. We therefore drop irrelevant terms by writing it as f{Y^^\Z^, Y^j)- Only now 
do we condition on event Q, which acts to restrict Yy^\Zy, Fjj, 2 to be < T — t — the value of T being 
fixed by the observed values of Z, , Fji- ^^22 ^^^^ J^^+i i (^i, J21' J22' Jl+i.i)- This yields the density 



/(ri3lZi, 712,0 = 



PiQY 



7l3 - /i 



where Iq is the indicator function for event Q and 



y_oo cr V cr 



dY,- 



t — fi 



Taking the expectation of F13 IZj , Fjj, 2 yields 

f (¥) 



(6) 



3.2 Deriving the RB2 estimator 

The RBI estimator in (6) does not look like a direct analog of the two-stage UMVCUE of Bowden 
and Glimm (2008) — a correction to the two-stage MLE — because of the extra conditioning on Yi2- 
One way of avoiding this extra conditioning would be to calculate £"[713 lYp Yj, Zj, Q*], for a Q* that 
did not depend on Yyj- This can be achieved by defining the condition Q* simply as the portion of Q 
coming from the stage two selection rule (b). We can then state that I^^lYp Y!^, Zy, Q* < T* where: 

2 2 
cr, ai On 

^ — . -^1 2 21 2 22- 

criCT2 CT2 
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The quantity £'[7,3 |Yj, Yj, Zj, g*] can then be obtained following the procedure as in Section 3.1, to 
yield an alternative estimator for /u,; , which we will denote as jlf^ and refer to as the "RB2" estimator. 
It has the same form as (6) but with t replaced by the observed value t* of T*, with /x equal to the 
MLE /ij3 and with a- equal to — ^. 

3.3 Summarizing the RBI and RB2 estimators 

The RB2 estimator is the expected value of 7,3 given the complete sufficient statistic and under 
condition Q*. So, if Q* truly represented the condition restricting Fi3|Y[, Y!;, Zj under drop-the-losers 
selection, then /if would be the UMVUE given Q* . However, no such claim can be made because 
Q* is not the correct condition, it is Q, so E [FjjIYj , Y,, Zj , Q*] is conditionally biased with respect to 
jXy. Likewise, the RBI estimator can not claim to be the UMVUE, conditional on Q, because it does 
not condition on a minimal sufficient (and hence complete) statistic. 

One could argue, however, that if we simply condition on Fj, and Q tirst, then (Yj, Yj, Zj) is a 
minimal sufficient statistic and therefore the RBI estimator is a UMVCUE of sorts. This might be 
perceived as simply a semantic trick, and it is clear that conditioning on is not as natural as 
conditioning on Q alone. For this reason we will refer to /if*' and /if as simply Rao-Blackwellized 
estimators. 

3.3.1 A general formula for the RBI estimator 

In the Appendix, we derive the /if*' estimator for a /-stage drop-the-losers trial. This estimator will 
yield a more efficient, unbiased estimate of /i^ than using the stage / data alone. For the /-stage 
case, one is forced to condition on Zj and / — 2 additional variables corresponding to the individual 
treatment effect estimates of the ultimately selected treatment at stages 2 to / — 1 . Although the precise 
values of /i, a and / change, the /-stage estimator is identical in form to Eq. (6). Since it only ever 
requires the evaluation of a standard normal density and distribution function, it remains trivial to 
evaluate whatever the value of /. 

3.4 Strength of unbiasedness of the RBI estimator 

Since /ij is a random variable, it is convenient to consider the extended definition of bias due to Posch 
et al. (2005). That is, for a generic estimator /Xj of /Xj, the bias is given by: 

K 

Bias(/xt) = J2 Et/^t - I i^k = l]Pr(^A- =1) (7) 

k=l 

= (i^l)P^^ + B,()^*)P^^ + ■ ■ ■ + BM)P^^. 

Here /x^ refers to the true mean of treatment k, the k-th element of /x. i/f^, = 1 is the condition that 
treatment k is selected under the design and B/^ifi*) represents the bias of estimator /Xj conditional on 
treatment being selected. Although Bias (/if*' ) = 0, it actually fulfills a stronger form of unbiasedness, 
namely that 5/,.(Af^') = 0 Wk. 

3.5 An alternative near-unbiased estimator 

In Section 4, we show the price paid by the RBI estimator for unbiasedness is a substantial increase 
in mean squared error (MSE). Part of the motivation for developing the RB2 estimator was to see if 
it could trade off small amounts of bias for a MSE reduction. An additional bias-adjusted (but not 
unbiased) estimator is now considered. Bebu et al. (2010) proposed a likelihood based procedure for 
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obtaining a bias corrected MLE (which we will refer to as the BC-MLE) and confidence intervals for 
the selected treatment in a two-stage drop-the-losers trial. We now adapt their general approach to the 
specific example of a 3:2:1 drop-the-losers design setting in order to complement the simulations in 
Section 4. Extensions to the general three stage case are obvious, but when more treatment arms are 
added the computational effort in obtaining the BC-MLE increases markedly. 

Assuming that the variance of all stage- wise statistics are known, the log-likelihood of the parameter 
vector ft conditional on Q:i/r=(l,2,3) is proportional to 



Ml 



,2 Z2 fJ^2 



\ 2 



yogP{Q\n) 



(8) 



Here, A, B, and Zj are as defined in Section 3 after Eq. (5), being equal to the selected 



treatment's MLE with variance ct| 



|i . is defined here as the MLE of the second best treatment 



at stage 2, "^^^"^^^'i^^^ , with mean ^iid variance o-| 



5? 



Fji is the sole statistic on the 



af+a^ ^ ^2 Si ot+^i 

treatment that ranked last at stage one, with mean and variance CTj-. The penalizing term P(g|/x) 
represents the probability of event Q given /li. This is equivalent to the probability that all three elements 
of the multivariate density: 



/ 



^11 - -'31 

5^1 



\ 



31 



aj(r„-F„)+o-i^(yi,-y,,) 



V" 



MVN 



7 




2af 

-4 



,2 



^4/ 



are positive, and it can be approximated to a high degree of accuracy in R using the pmvnorm ( ) function 
(Genz & Bretz , 2009). The conditional log-likelihood (8) can then be maximized to yield joint estimates 

for /Xj, /Ltj; "iHd /X3. 



4 Simulation studies 

We now conduct a simulation study to compare the performance of the MLE, RBI, RB2, BC-MLE, 
and stage three estimator Fj, in estimating /Xj under drop-the-losers selection. Various parameter 
constellations for the true treatment means and the stage-wise variances are considered. We note that 
in a real trial setting the quantity of interest would be the treatment versus control group comparison 
/Xj — /Xq, but for reasons already discussed we can ignore estimation of /Xq. By averaging over the results 
of all simulations, we obtain a Monte-Carlo estimate for the estimators' bias as given in Eq. (7). By 
summing the estimators' squared errors across all simulations, we can approximate their MSE, when 
defined analogously to Eq. (7) as well. 



4.1 Point estimation for a 3:2:1 trial 

In a simple initial simulation, all three treatments are assumed to have a true mean effect of 0, so that 
/Xj = 0. The number of patients recruited to each remaining treatment arm at each stage («i, "3) is 
100, 50, and 25, respectively. v~ — 50 so that (ctj, ct,, ctj) = (-^, 1, V2). Figure 3 shows the distribution 
of 100,000 realizations of the MLE, RBI estimator, RB2 estimator, BC-MLE, and Fj,. From this 
simulation their empirical biases and MSEs were (0.41, 0.00, 0.04, -0.26, 0.00) and (0.35, 0.85, 0.69, 
0.80, 2.00), respectively. Yy^ is unbiased but inefficient and at the other extreme the MLE is efficient but 
biased. The RBI estimator is unbiased and has a substantially lower MSE than Yy^ because it utilizes 
data from the first two stages. Incorrect conditioning induces a small amount of bias into the RB2 
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MLE 

RB1 

RB2 

BC-MLE 

Stage three Y_1 3 




\ \ I \ \ 

-4 -2 0 2 4 

Estimate 

Figure 3 Distribution of the 5 estimators' estimates for = 0 from a three-stage 3:2:1 drop-the- 
losers design, given stage-wise variances (ctj,ct2,ct3) = (J^, l,\/2). Each distribution is based on 
100,000 simulations. 



Table 1 Bias and MSE of the various estimands over the four scenarios (50,000 simulations per 
scenario). In each case (a;, ctj) = (U 1 ,!)■ 

Parameter values MLE RBI RB2 BC-MLE Stage three 7; 3 



Bias 



(0, 0, 0) 


0.377 


-0.005 


0.038 


-0.142 


-0.004 


(0, i i) 
(0, 1, 1) 


0.357 


0.007 


0.043 


-0.132 


0.007 


0.310 


0.004 


0.041 


-0.122 


0.012 


(0, i, 2) 


0.111 


-0.005 


0.010 

MSE 


-0.092 


0.012 


(0, 0, 0) 


0.388 


0.690 


0.563 


0.579 


0.982 


(0, |, 1) 
(0, I 2) 


0.384 


0.686 


0.550 


0.570 


1.002 


0.372 


0.670 


0.524 


0.553 


1.017 


0.347 


0.587 


0.413 


0.471 


1.019 



estimator but is accompanied by a substantial decrease in MSE. The BC-MLE reduces the magnitude 
of the bias in the MLE but is shown to overcorrect. Its modal value is, however, close to the true value 
ofO. 

In Table 1, we show the bias and MSE of the estimators for this choice of parameter values and 
three additional true parameter constellations. Trial data were simulated with 50 patients per-arm 
per-stage so that a^ — ^2 — — ^- The same general pattern is observed across all scenarios, except 
the magnitudes of the biases and MSEs change. 
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Table 2 Coverage and mean confidence interval width for the BC-MLE, RBI, and RB2 estimators. 
(10,000 simulations per scenario). In each case (aj, Oj, ctO = (1, 1, 1). 





RBI 




BC-MLE 




RB2 




Parameter values 




CI width 




CI width 




CI width 


(0, 0, 0) 

(0, i 1) 
(0, I 2) 


0.963 
0.958 
0.954 
0.956 


3.31 
3.27 
3.20 
2.96 


0.949 
0.950 
0.948 
0.945 


2.92 
2.90 
2.83 
2.57 


0.951 
0.949 
0.946 
0.956 


2.91 
2.87 
2.80 
2.49 



4.2 Interval estimation for a 3:2:1 trial 

It is possible to derive an expression for the variance of the RBI estimator using the delta method, 
as, for example, Koopmeiners et al. (2012) do in the context of a single arm two-stage trial with a 
binary endpoint. However, from Fig. 3 we see that, in the context of a three-stage drop-the-losers trial, 
the distribution of jlf^^ is highly skewed. Therefore, even if we were to derive analogous expressions 
for the variance of these quantities, it would not appear sensible to use them to furnish symmetric 
confidence intervals around the point estimate. For this reason, we adapt the nonparametric bootstrap 
procedure — originally proposed by Pepe et al. (2009) for a single arm two-stage trial — to the three- 
stage drop the loser setting. Specifically, we perform the following resampling schema to trial data 

(1) Produce bootstrap sample of first stage data, with mean 

(2) If 7*[ is > original observed value yL+i,i- 

a. Produce bootstrap samples of second stage data, with mean 7*2. 

b. If stage two MLE °"- jg > original observed value : 

i. Produce bootstrap samples of third stage data, with mean 7j*3. 

ii. Calculate the RBI estimator /x* from Eq. (6) given Y*y, Y^2, Y*j^, and original observed 
value t. 

This should be repeated until a large enough collection of /2*s have been obtained to accurately 
assess its sampling distribution. Empirical quantiles of this distribution can then be read-off to give 
confidence intervals for /if ^' . Upon implementing this procedure, it is no extra effort to additionally 
calculate bootstrapped confidence intervals for the RB2 estimator at the same time. 

Confidence intervals for the BC-MLE of /Hj are obtained by the profile likelihood approach described 
in Bebu et al. (2010). That is, we calculate the statistic R(fJ.'i') as twice the difference between the log- 
likelihood evaluated at the joint BC-MLEs for (/x,, /Xj, /X3) and at the BC-MLEs for (/x^, M3) given 
the constraint /Xj = /xf^. A (1 — a) level confidence interval for /Xj is then the set of values for which 

i?(/xr)<xf(i-«). 

Table 2 shows, for a — 0.05, the average confidence interval width and the resulting coverage of the 
BC-MLE, RBI, and RB2 estimators for the four scenarios already introduced and with 50 patients 
per treatment arm per stage as before. Within each simulation, confidence intervals and coverage 
were assessed with respect to the true (fixed) value of /Xj, and then averaged across simulations. Each 
bootstrapped confidence interval calculated was based on 1000 simulated values of /x*. The overall 
figures are based on only 10,000 simulations — obtaining a confidence interval for the BC-MLE using 
the profile likelihood method required substantially more computational effort than the bootstrap 
procedure, and so this was the limiting factor. 
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Figure 4 Bias, MSE , coverage (w.r.t. 95% confidence intervals), and average 95% confidence interval 
width of the MLE, RBI, and RB2 estimators as a function ofk, for =0, ai—aj—f^^—^- The stage 
three estimate j is also shown where informative. 



Both the bootstrap and profile likelihood approaches appear to provide confidence intervals with 
nominal coverage. The RBI estimate's confidence interval width is by far the widest. The smallest 
width is obtained from the RB2 estimator, but it is only marginally smaller than that of the BC-MLE. 

4.3 Further results for K:2:l trials 

Trial data is now simulated under a K:2: 1 design with all treatment means equal to 0 (so that fi^ = 0), 
as in Section 2.1. Figure 4 (top-left to bottom-right) plots the bias, MSE, coverage and confidence 
interval width of the MLE, RBI, and RB2 estimators as a function of K. The properties of the stage 
three estimator Fjj are shown where informative (it is unbiased, with a known variance, so a simple 
symmetric confidence interval around it will achieve its nominal coverage). Each point is based on 
50,000 simulations. The BC-MLE is not evaluated as it becomes too computationally demanding. 
However, there is no reason to suspect that its performance significantly worsens as more treatment 
arms are added. Standard symmetric confidence intervals are used for the MLE, assuming a normal 
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Table 3 Proportion of times the 95% confidence interval for the MLE, RBI, and RB2 estimators are 
above or below /Xj. 



K 


RBI 




RB2 




MLE 




% Above 111 


% Below /Lti 


% Above /Ltj 


% Below /Lti 


% Above /Lti 


% Below 111 


3 


3.9 


0.48 


5.3 


0.130 


6.4 


0.084 


4 


4.0 


0.34 


6.2 


0.052 


7.9 


0.044 


5 


4.2 


0.32 


6.9 


0.044 


9.6 


0.012 


6 


4.1 


0.31 


7.1 


0.044 


10.0 


0.016 


7 


4.3 


0.30 


7.7 


0.024 


12.0 


0.016 


8 


4.3 


0.32 


8.0 


0.032 


13.0 


0.012 



distribution, and ignoring selection. While this could be termed a "standard" analysis, there is of 
course no reason to believe that this confidence interval will achieve its nominal coverage probability. 
As K increases, the bias of RB2 estimator increases but stays at modest levels compared to the MLE. 
The RBI estimator and are unbiased. The MSE of the MLE is substantially lower than the other 
two estimators when ^=3, but rises more quickly than the other two as K increases. The MSE of Yy^ 
is 1, by definition. 

The coverage of the RBI estimator's bootstrapped 95% confidence interval stays relatively constant 
over the range of K, but is always slightly conservative. The RB2 estimator's coverage starts to worsen 
as K increases, but not to the same extent as the MLE. Table 3 shows, for these three estimators, the 
proportion of times that their 95% confidence intervals are above or below the true value of /Xj. In 
each case, being above the true value is far more likely. This can be understood by the fact that all three 
are more likely to over-estimate than under-estimate the true effect. For example, in the simulations 
shown in Fig. 3 the MLE, RBI, and RB2 estimators overestimate /Xj 83%, 57%, and 61% of the time, 
respectively. Of course, in the case of the RBI estimator, this tendency to overestimate is perfectly 
cancelled out by less frequent, but larger, underestimation so that E[ilf^^ lAf^' > > Mi) 

+ E[ilf^^\iif^^ < iii]Pr{jif^^ < /Lt[) = /X[. While the above/below ratio of the RBI estimator stays 
fairly constant (around 10:1), the RB2 and MLE above/below ratio increases rapidly with increasing 
K. This reflects their increasing positive bias. 

The MLEs average confidence interval width (a constant value of 3.92V1/3) is far lower than the 
other two estimators — the RBI estimator's confidence interval is on average 60% wider than the MLEs 
when K —8, but the MLE suffers from suboptimal coverage as a result. The confidence interval width 
of is a constant value of 3.92, which gives an idea as to the additional gain in using the RBI 
estimator, if unbiased estimation is required. 

5 Discussion 

In this paper, we have explored the issue of estimation for a multistage analog of the two-stage drop- 
the-loser design. Our main focus was to generalize the work of Cohen & Sackrowitz (1989) to enable 
efficient unbiased estimation of the selected treatment. In this regard, we can only claim to have 
been partially successful. The RBI estimator is unbiased and has a lower variance than the final- 
stage estimator, but it is derived using an additional condition that is needed to overcome technical 
difficulties. Further work may reveal that these conditions can be relaxed to yield more efficient 
unbiased estimators. Perhaps unsurprisingly, the RBI estimator was shown to have a large MSE, due 
to its unbiasedness. The RB2 estimator was derived as an alternative; its less stringent conditioning 
(on the minimal sufficient statistic) resulted in an estimator with a small amount of bias but a greatly 
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reduced MSE in the context of a 3:2:1 trial. However, its performance (in particular the coverage of 
its bootstrapped confidence intervals) worsened for K:2:l trials as K increased. 

Our derivation of the RBI and RB2 estimators assumes that the within treatment arm variances 
(the v^.s) and the number patients randomized to each treatment arm at stage j (tij) are equal across 
treatments. This meant that ranking via the test statistic in Eq. (1) is equivalent to ranking by the 
mutually independent experimental treatment MLEs at each stage. Indeed, the independence property 
gained by ignoring the common control group data in the selection process is key to the proof If these 
two conditions are not met then ranking via test statistic and MLE will not be equivalent, so the RBI 
and RB2 estimators as stated here will be invalid. Our development also assumes that the vj. terms 
are known and a very different approach would be required if they were assumed unknown (Cohen 
& Sackrowitz, 1989). The BC-MLE approach of Bebu et al. (2010) is, in contrast, much better suited 
to the unknown variance case, since they can simply be included as additional parameters in their 
conditional likelihood. 

The multistage drop-the-losers design assumes that the trial always proceeds to the final stage. Whilst 
this could be criticized as nonsensical and inefficient when strong evidence exists to stop the trial early, 
it gives the trial a fixed sample size that may make it attractive to practitioners and funding bodies 
alike (Kairalla et al., 2012). One may, however, wish to augment this design with an efficacy/futility 
stopping rule, as do Stallard & Friede (2008) and Wu et al. (2010). Our approach can easily be adapted 
to yield a RBI or RB2 estimator in this context, by recalculating (in the three stage case) exactly how 
the sampling space of Fj, was restricted conditional on Zj , Yyj, Q and given the trial made it to the final 
stage. However, from an estimation perspective, conditioning on reaching the final stage only really 
makes sense when the trial can stop early for futility and not efficacy (Pepe et al., 2009). Moreover, in 
this case whilst estimation of ^J. ^ is still trivial, unbiased estimation of the treatment control comparison 
fi^ — fiQ does not immediately follow, because the control group data must itself be corrected for some 
selection bias induced by the stopping rules. This extra comphcation has, however, been successfully 
addressed in recent work by Kimani et al. (2013). 

Further research is needed to explore and understand how best to design and analyze /-stage drop- 
the-losers trials from an operational planning and hypothesis testing perspective. For example, how to 
calculate the critical value for testing the selected treatment against the control at the final stage, whether 
it is possible to control the type I error rate of this test in the strong sense; finding multistage designs 
(e.g., the K and L in the three stage context) that are optimal — in terms of maximal power and minimal 
size. Some preliminary work on this subject can be found in a technical report (Wason & Bowden, 
2012) available at https://sites.google.com/site/jmswason/supplementary-material. Software (in the 
form of R code) to reproduce our results can be found in the supplementary material accompanying 
this paper. 
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Appendix: The RBI estimator for a J-stage drop-the-losers trial 

In this section, we introduce a more general notation. Let Nj represent the number of experimental 
treatment arms active in the trial at stage j, for j = 1 , . . . , /. Clearly — K and > N2 > ■ ■ ■ > 
Nj — I. Define ^Ij to be the set of all treatment arms in the trial at stage j, so that: 

^2^. = {1, ...,A^^.}, D D ... D ^2^. 
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Let Fy^- represent the effect estimate of treatment u at stage j for m = 1, 
deviation of the estimates at stage j. Further define: 



. Nj. Let CT^- be the standard 



Let Y^i, Yi2, ■ ■ ■ , Y^j represent the / effect estimates associated with the (uhimately) selected treat- 
ment with true mean effect /Hj. Let Y^' represent the vector of the unselected treatment effect 
estimates at stage j. The vector of all treatment estimates across all stages can be written as 



is equal to 



17' 



! 1,7-1' 



j) and has length N 



J2j=i ^j- The MLE of /Xj at stage j, fx^j, 



1 wj,„ 



ELi w,, 



w., = 



(Al) 



Define Z^— A Yli=\ ^uYiu and rewrite Y^j in terms of Zj and Y^ 



1,7-1 



as 



17 ■ 



AW, 



^ W, 



As for the three-stage example in Section 2, in the first J — I stages of a /-stage trial, we sequentially 
rank the treatment arms by the order of their cumulative MLEs, defined for each treatment remaining 
in the trial at stage j as in Eq. (Al) . The j-th stage imposes the restriction: 



Mi,; > M«+l,/' 



1, 



Given Zj, when Y^j satisfies all of / — 1 selection conditions required by event Q, it is restricted to 
be less than or equal to 



T — min 



AW, 



E 

H=l 



w, 



j2 ^''^1"' 



« =7+1 



w, 



7=1' 



(A2) 



where Fy ,+i,h represents the treatment effect estimate from stage u associated with the (A'^+i + l)-th 
largest cumulative MLE at stage j and the second summation is only defined and evaluated if j < / — 1 . 
For example, in Section 2 we set J—3 and conditions (a) and (b) on Y^^ correspond to setting j equal to 1 
and 2 in formula (A2) respectively. This suggests that when conditioning Y^jlZ^ on Q, we additionally 
need to condition on Yj = (Fjj, . . . , Yij_i) when calculating the RBI estimator . Let Y^, Yj, Yyj 
and Yf = (Yj . . ■ Y^ j) represent the complete data from the /-stage trial. Following the previous 
development in Section 3, we shall transform the multivariate normal densities as follows: 

/ {Y,j, r,„ Y„ Yf ) ^ / Z„ Y„ Yf ) ^ / (r,,|z, , Y, , Yf ) . 

Upon demonstrating that Y^j is independent of Y^, ft given Zj, Yj we finish by going from 



f(Y,j\Z„ Yi) ^ f(Y,j\Z„\„ 0 ^ E(Y,j\Z„ Yi, Q). 
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The density f{Yyj, Fn, Yj, Yf ) is ~ MF7V^(C)[t, CEC') where: 



/ Ml \ 

Ml 



Ml 



\ M* / 



A 
0 



A 



A 
0 



0 

A 
al 
0 



0 
0 
0 



a 



0 



0 

,*2 



Vo 



0 \ 



0 

0 a*-) 



Letting CEC = E* we define: 

5^* _ / -^U -^12 



V 21 22y 

where Ej[= crj, is equal to the - 1 vector 0, . . . , 0) = ^^^^ is the remaining —\) 
X (tV — 1) matrix. The conditional distribution /(Fj^lZj, Yj, Y[^) is ~ ^{yi, ct^), where: 

for a = (Zp Yj, Yj), /t,, equal to mean parameter vector of a and — E*i — EijEj^'EJi. Using 
block-wise inversion techniques (e.g. Srivastava, 2002, Corollary A.5.2) E*^' can be expressed as 



M, 



-M^AM^ 



-{M^AMj}' A-M^M^ + Mj ' 

where Mj is the scaler (B^ - A^ J2i=2 ^u^^'^i — (^2' ■ • • > ^/-i) the values of M3 and M4 are 
unimportant. We now see that 



/X = Ml + 



^A/i (^Zi - ^Mi) - ^'Mi (^^2(712 - Ml), ... , Wj_,iY,j_, - f^O) 

= AM,(z,-AY]KYu)j^ 

and CT^ = (Ty — ^-Mj. Noting that this distribution does not depend on fi or we write 
/(7i^|Zi,Yi,0 as 



and £'[7i^|Zi,Yi,e] equals 



fij = — a 



(A3) 
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The J-stage RB2 estimator could be derived as in Section 3.2, by ignoring the first / — 2 selection 
steps that depend on (Fjj, . . . , However, the bias of the RB2 estimator is likely to be increasing 

as a function of /. 
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